179 research outputs found
Distributed-Pair Programming can work well and is not just Distributed Pair-Programming
Background: Distributed Pair Programming can be performed via screensharing
or via a distributed IDE. The latter offers the freedom of concurrent editing
(which may be helpful or damaging) and has even more awareness deficits than
screen sharing. Objective: Characterize how competent distributed pair
programmers may handle this additional freedom and these additional awareness
deficits and characterize the impacts on the pair programming process. Method:
A revelatory case study, based on direct observation of a single, highly
competent distributed pair of industrial software developers during a 3-day
collaboration. We use recordings of these sessions and conceptualize the
phenomena seen. Results: 1. Skilled pairs may bridge the awareness deficits
without visible obstruction of the overall process. 2. Skilled pairs may use
the additional editing freedom in a useful limited fashion, resulting in
potentially better fluency of the process than local pair programming.
Conclusion: When applied skillfully in an appropriate context, distributed-pair
programming can (not will!) work at least as well as local pair programming
Plagiarism in Take-home Exams: Help-seeking, Collaboration, and Systematic Cheating
Due to the increased enrollments in Computer Science education programs, institutions have sought ways to automate and streamline parts of course assessment in order to be able to invest more time in guiding students' work. This article presents a study of plagiarism behavior in an introductory programming course, where a traditional pen-and-paper exam was replaced with multiple take-home exams. The students who took the take-home exam enabled a software plugin that recorded their programming process. During an analysis of the students' submissions, potential plagiarism cases were highlighted, and students were invited to interviews. The interviews with the candidates for plagiarism highlighted three types of plagiarism behaviors: help-seeking, collaboration, and systematic cheating. Analysis of programming process traces indicates that parts of such behavior are detectable directly from programming process data.Peer reviewe
LittleDarwin: a Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems
Mutation testing is a well-studied method for increasing the quality of a
test suite. We designed LittleDarwin as a mutation testing framework able to
cope with large and complex Java software systems, while still being easily
extensible with new experimental components. LittleDarwin addresses two
existing problems in the domain of mutation testing: having a tool able to work
within an industrial setting, and yet, be open to extension for cutting edge
techniques provided by academia. LittleDarwin already offers higher-order
mutation, null type mutants, mutant sampling, manual mutation, and mutant
subsumption analysis. There is no tool today available with all these features
that is able to work with typical industrial software systems.Comment: Pre-proceedings of the 7th IPM International Conference on
Fundamentals of Software Engineerin
Managing plagiarism in programming assignments with blended assessment and randomisation.
Plagiarism is a common concern for coursework in many situations, particularly where electronic solutions can be provided e.g. computer programs, and leads to unreliability of assessment. Written exams are often used to try to deal with this, and to increase reliability, but at the expense of validity. One solution, outlined in this paper, is to randomise the work that is set for students so that it is very unlikely that any two students will be working on exactly the same problem set. This also helps to address the issue of students trying to outsource their work by paying external people to complete their assignments for them. We examine the effectiveness of this approach and others (including blended assessment) by analysing the spread of similarity scores across four different introductory programming assignments to find the natural similarity i.e. the level of similarity that could reasonably occur without plagiarism. The results of the study indicate that divergent assessment (having more than one possible solution) as opposed to convergent assessment (only one solution) is the dominant factor in natural similarity. A key area for further work is to apply the analysis to a larger sample of programming assignments to better understand the impact of different features of the assignment design on natural similarity and hence the detection of plagiarism
Open Science in Software Engineering
Open science describes the movement of making any research artefact available
to the public and includes, but is not limited to, open access, open data, and
open source. While open science is becoming generally accepted as a norm in
other scientific disciplines, in software engineering, we are still struggling
in adapting open science to the particularities of our discipline, rendering
progress in our scientific community cumbersome. In this chapter, we reflect
upon the essentials in open science for software engineering including what
open science is, why we should engage in it, and how we should do it. We
particularly draw from our experiences made as conference chairs implementing
open science initiatives and as researchers actively engaging in open science
to critically discuss challenges and pitfalls, and to address more advanced
topics such as how and under which conditions to share preprints, what
infrastructure and licence model to cover, or how do it within the limitations
of different reviewing models, such as double-blind reviewing. Our hope is to
help establishing a common ground and to contribute to make open science a norm
also in software engineering.Comment: Camera-Ready Version of a Chapter published in the book on
Contemporary Empirical Methods in Software Engineering; fixed layout issue
with side-note
An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators
<p>Abstract</p> <p>Background</p> <p>Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets.</p> <p>Results</p> <p>The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of <it>translators</it> required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data.</p> <p>Conclusions</p> <p>Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: <url>http://pypi.python.org/pypi/rpy2-bioconductor-extensions/</url></p
A comparison of common programming languages used in bioinformatics
<p>Abstract</p> <p>Background</p> <p>The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.</p> <p>Results</p> <p>Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found.</p> <p>Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url></p> <p>Conclusion</p> <p>This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p
AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment
<p>Abstract</p> <p>Background</p> <p>Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community.</p> <p>Results</p> <p>This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment.</p> <p>Conclusions</p> <p>AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.</p
Deep Learning Application in Security and Privacy - Theory and Practice:A Position Paper
Technology is shaping our lives in a multitude of ways. This is fuelled by a
technology infrastructure, both legacy and state of the art, composed of a
heterogeneous group of hardware, software, services and organisations. Such
infrastructure faces a diverse range of challenges to its operations that
include security, privacy, resilience, and quality of services. Among these,
cybersecurity and privacy are taking the centre-stage, especially since the
General Data Protection Regulation (GDPR) came into effect. Traditional
security and privacy techniques are overstretched and adversarial actors have
evolved to design exploitation techniques that circumvent protection. With the
ever-increasing complexity of technology infrastructure, security and
privacy-preservation specialists have started to look for adaptable and
flexible protection methods that can evolve (potentially autonomously) as the
adversarial actor changes its techniques. For this, Artificial Intelligence
(AI), Machine Learning (ML) and Deep Learning (DL) were put forward as
saviours. In this paper, we look at the promises of AI, ML, and DL stated in
academic and industrial literature and evaluate how realistic they are. We also
put forward potential challenges a DL based security and privacy protection
technique has to overcome. Finally, we conclude the paper with a discussion on
what steps the DL and the security and privacy-preservation community have to
take to ensure that DL is not just going to be hype, but an opportunity to
build a secure, reliable, and trusted technology infrastructure on which we can
rely on for so much in our lives
- …